Constructing E cient Decision Trees by Using Optimized Numeric Association Rules
نویسندگان
چکیده
We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be ine cient if any numeric attributes are strongly correlated. Our approach o ers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pair of attributes. For R 2 R, the data can be split into two classes: data inside R and data outside R. We compute the region R opt 2 R that minimizes the entropy of the splitting, and add the splitting associated with R opt (for each pair of strongly correlated attributes) to the set of candidate tests in Quinlan's entropy-based heuristic. We give e cient algorithms for cases in which R is (1) x-monotone connected regions, (2) basedmonotone regions, (3) rectangles, and (4) rectilinear convex regions. The algorithm for the rst case has been implemented as a subsystem of SONAR(System for Optimized Numeric Association Rules) developed by the authors. Tests show that our approach can create small-sized decision trees. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 22nd VLDB Conference Mumbai(Bombay), India, 1996
منابع مشابه
Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules
1 Introduction We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quin-lan pointed out that his original method (as well as other existing methods) may be inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric...
متن کاملKnowledge Discovery from Health Data Using Weighted Aggregation Classifiers
Introduction. The automatic construction of classifiers is an important research problem in data mining, since it provides not only a good prediction but provides also a characterization of a given data in the form easily understood by a human. A decision tree [4] is a classifier widely used in real applications, which are easy to understand, and efficiently constructed by using a method based ...
متن کاملE cient Construction of Regression Trees with Range and Region Splitting
We propose an e cient way of constructing regression trees in order to predict the objective numeric attribute values of given tuples. A regression tree is a rooted binary tree such that each internal node contains a test, which can be expressed as an RDB query, for splitting tuples into two disjoint classes and passing data in each class down to the left or right subtree. The mean of the objec...
متن کاملMining Optimized Support Rules for Numeric Attributes
Mining association rules on large data sets have received considerable attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. O...
متن کاملMining Optimized Association Rules with Categorical and Numeric Attributes
ÐMining association rules on large data sets has received considerable attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996